Systematic benchmark of substructure search in molecular graphs - From Ullmann to VF2

نویسندگان

  • Hans-Christian Ehrlich
  • Matthias Rarey
چکیده

UNLABELLED BACKGROUND Searching for substructures in molecules belongs to the most elementary tasks in cheminformatics and is nowadays part of virtually every cheminformatics software. The underlying algorithms, used over several decades, are designed for the application to general graphs. Applied on molecular graphs, little effort has been spend on characterizing their performance. Therefore, it is not clear how current substructure search algorithms behave on such special graphs. One of the main reasons why such an evaluation was not performed in the past was the absence of appropriate data sets. RESULTS In this paper, we present a systematic evaluation of Ullmann's and the VF2 subgraph isomorphism algorithms on molecular data. The benchmark set consists of a collection of 1235 SMARTS substructure expressions and selected molecules from the ZINC database. The benchmark evaluates substructures search times for complete database scans as well as individual substructure-molecule pairs. In detail, we focus on the influence of substructure formulation and size, the impact of molecule size, and the ability of both algorithms to be used on multiple cores. CONCLUSIONS The results show a clear superiority of the VF2 algorithm in all test scenarios. In general, both algorithms solve most instances in less than one millisecond, which we consider to be acceptable. Still, in direct comparison, the VF2 is most often several folds faster than Ullmann's algorithm. Additionally, Ullmann's algorithm shows a surprising number of run time outliers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Effective Path-aware Approach for Keyword Search over Data Graphs

Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...

متن کامل

Application of Graph - Based Chemical Nomenclature to Theoretical and Preparative Chemistry 1 ' *

Development of graph-based systematic names containing mathematical descriptions of molecular graphs is described. Such names can be regarded as compact connection tables. Exploration of the use of graph-based systematic names for information storage and retrieval purposes, in substructure lsearching, and as an aid in pattern recognition, structure-activity relationships, drug design, etc., is ...

متن کامل

finding influential individual in Social Network graphs using CSCS algorithm and shapley value in game theory

In recent years, the social networks analysis gains great deal of attention. Social networks have various applications in different areas namely predicting disease epidemic, search engines and viral advertisements. A key property of social networks is that interpersonal relationships can influence the decisions that they make. Finding the most influential nodes is important in social networks b...

متن کامل

Solving Graph Isomorphism Using Parameterized Matching

We propose a new approach to solve graph isomorphism using parameterized matching. To find isomorphism between two graphs, one graph is linearized, i.e., represented as a graph walk that covers all nodes and edges such that each element is represented by a parameter. Next, we match the graph linearization on the second graph, searching for a bijective function that maps each element of the firs...

متن کامل

Approximate Substructure Searchin a Database of 3 D Graphs

Given a database D of three dimensional (3D) graphs and a query graph Q, the problem of substructure search is deened as nding the graphs in D that contain Q. This is an important search operation in scientiic databases. This paper extends the search operation to nd those graphs D in D that \approximately" contain Q in the presence of rotation , translation, distortion, and node insert/delete i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2012